**Recipe: OpenLB for Intel® Xeon Phi™ Coprocessors**

Authors: Zhou,Shan (Intel)

**I. Overview**

This article provides a recipe for how to obtain, compile, and run an optimized version of olb-0.8r0 on Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors.

The source for this version of olb-xeon-phi-0.8.0 can be downloaded from:

**II. Introduction**

OpenLB is an open source library for lattice Boltzmann(LBM) simulations, which is under the license of GNU General Public License V.2 (GPL2). The code is in C++ and can be used to simulate physical phenomena, with emphasis on fluids and the kernel module is based on a variety of lattice Boltzmann models. Parallelization is achieved through the MPI and OpenMP in OpenLB code. More about OpenLB project is <http://optilb.org/openlb/>

This project, olb-xeon-phi-0.8.0, optimizes the performance of the two dimension part of kernel module on both Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors, and supports a symmetric execution model that uses both architectures in cooperation.

Optimizations in this package include:

(1) Achieve Cache Blocking to improve data locality.

(2) Minimize the unnecessary computing, especially the constant computing in the hotspot loop.

(3) Replace div with multiply with the reciprocal.

(4) Loop Unrolling to reduce memory access.

(5) Set OpenMP affinity to reduce the thread migration.

Other Modification:

Add a new workload cylinder2d-new based on the example workload cylinder2d:

1. Remove the initialization and the part of result output in the workload cylinder2d from the example in the package. And the benchmark only focuses on the kernel part, which is based on LBM
2. Enlarge the lattice size of cylinder2d to get benefit from many core.

The original olb-0.8r0 package can be downloaded from:

<http://optilb.org/openlb/download>

**III. Preliminaries**

1. To build this package, install the [Intel® MPI Library](http://software.intel.com/en-us/intel-mpi-library) 5.0.1.035 and [Intel® C++ Composer XE](http://software.intel.com/en-us/intel-composer-xe)15 or higher products on your host system. Your host system must also have the Intel® MPSS for Linux\* installed (installed separately from Intel® C++ Compiler XE).

2. Download olb-xeon-phi-0.8.0.tar.gz from https://github.com/app-on-mic/openlb-on-mic

3. Install NFS and start the NFS service, export the /home directory, and mount it to the Intel® Xeon Phi™ coprocessor:

> service nfs start

> vi /etc/exports

Add:

/home 172.31.0.0/16(rw,insecure,no\_root\_squash,async)

> exportfs -au

> exportfs -ar

> showmount -e

Export list for Host:

/home 172.31.0.0/16

Mount the /home directory to the Intel® Xeon Phi™ coprocessor:

> service mpss stop

> micctrl --addnfs=/home --dir=/home

> service mpss start

1. Set up the Intel® MPI Library and Intel® C++ Compiler environments:

> source /opt/intel/impi/<version>/bin64/mpivars.sh

> source /opt/intel/composer\_xe\_<version>/bin/compilervars.sh intel64

> vi ~/.bashrc

Add:

export I\_MPI\_MIC=enable

export I\_MPI\_MIC\_POSTFIX=\_mic

export I\_MPI\_FABRICS=shm:tcp

export I\_MPI\_PIN=enable

Run ~/.bashrc

**IV. Compile olb-xeon-phi**

1. Upload the Intel® MPI Library and Intel® C Compiler components to the Intel® Xeon Phi™ coprocessor:

> scp /opt/intel/impi/<version>/mic/bin/mpiexec mic0:/bin/

> scp /opt/intel/impi/<version>/mic/bin/pmi\_proxy mic0:/bin/

> scp /opt/intel/impi/<version>/mic/lib/lib\*.so\* mic0:/lib64/

> scp /opt/intel/composer\_xe\_<version>/compiler/lib/mic/\*.so\* mic0:/lib64

> scp /opt/intel/composer\_xe\_<version>/tbb/lib/mic/\*.so\* mic0:/lib64

> scp /opt/intel/composer\_xe\_<version>/mkl/lib/mic/lib\*.so mic0:/lib64

1. Unpack the source code to any directory of /home and build the library for the Intel® Xeon® processor and the Intel® Xeon Phi™ coprocessor.

* Build the static library libolb.a for the Intel® Xeon® processor

> tar –xzvf olb-xeon-phi-0.8.0.tar.gz

> mv olb-xeon-phi-0.8.0 olb-xeon-phi-0.8.0-cpu

> cd olb-xeon-phi-0.8.0-cpu

> ./build-cpu

* Build the static library libolb.a for the Intel® Xeon Phi™ coprocessor.

> tar –xzvf olb-xeon-phi-0.8.0.tar.gz

> mv olb-xeon-phi-0.8.0 olb-xeon-phi-0.8.0-mic

> cd olb-xeon-phi-0.8.0-mic

> ./build-mic

1. Compile the test workload

* Build the test workload cylinder2d-new for the Intel® Xeon® processor

> cd olb-xeon-phi-0.8.0-cpu/example/cylinder2d-new

> ./build-for-cpu

It will create the binary file cylinder2d.

* Build the test workload cylinder2d-new for the Intel® Xeon Phi™ coprocessor.

> cd olb-xeon-phi-0.8.0-mic/example/cylinder2d-new

> ./build-for-mic

It will create the binary file cylinder2d\_mic.

**VI. Run the test workload on Intel® Xeon Phi™ coprocessor**

>./cyinder2d-mic-run

Which runs cylinder2d natively on the Intel® Xeon Phi™ coprocessor.

**VII. Run the test workload on both the Intel® Xeon® processor and Intel® Xeon Phi™ coprocessor in symmetric mode**

1. For an Intel® Xeon® processor and one Intel® Xeon Phi™ coprocessors, do this:

> cd olb-xeon-phi-0.8.0-cpu/example/cylinder2d-new

> cp olb-xeon-phi-0.8.0-mic/example/

cylinder2d-new/cylinder2d\_mic ./

> ./cylinder2d-cpu-and-mic

**VIII. Performance gain**

For the cylinder2d workload we described above, the following graph shows the speedup achieved from olb-xeon-phi-0.8.0. As you can see, we get up to a 1.53x speedup can be achieved when running this code on one Intel® Xeon Phi™ Coprocessor + Intel® Xeon® Processor E5-2697 v2 vs. the original code (olb-0.8.0, 12 Ranks) running on a 2-Socket Intel® Xeon® Processor E5-2697 v2. There is no obvious improvement on Intel® Xeon® Processor compared to the original code due to the memory bandwidth limitation.

* 2S Intel® Xeon® processor E5-2697 v2, baseline OpenLB (12 Ranks)
* 2S Intel Xeon processor E5-2697 v2, optimized OpenLB (12 Ranks)
  + 2S Intel Xeon processor E5-2697 v2 (12 Ranks)  
    + Intel® Xeon Phi™ coprocessor (12 Ranks \* 20 thread Per Rank)  
    (pre-production HW/SW)

Testing platform configuration:

Server with Intel® Xeon® Processors E5-2697 v2: Two sockets, 12-core, 2.7 GHz, 64GB DDR3-1600, 8.0 GT/s, OS version: Red Hat Enterprise Linux Server release 6.2, Intel® Turbo Boost Technology enabled, Intel® Hyper-Threading Technology (Intel® HT Technology) enabled

Intel® Xeon Phi™ Coprocessor: 7120A, 61-core 1.238 GHz, 16GB GDDR5-5500, 5.5 GT/s, MPSS 3.3, Flash version 2.1.02.0390, uOS version : 2.6.38.8, ECC enabled, Intel® Turbo Boost Technology enabled

For more information go to http://www.intel.com/performance

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products.

Optimization Notice

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804

By using this document, in addition to any agreements you have with Intel, you accept the terms set forth below.

You may not use or facilitate the use of this document in connection with any infringement or other legal analysis concerning Intel products described herein. You agree to grant Intel a non-exclusive, royalty-free license to any patent claim thereafter drafted which includes subject matter disclosed herein.

INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.

A "Mission Critical Application" is any application in which failure of the Intel Product could result, directly or indirectly, in personal injury or death. SHOULD YOU PURCHASE OR USE INTEL'S PRODUCTS FOR ANY SUCH MISSION CRITICAL APPLICATION, YOU SHALL INDEMNIFY AND HOLD INTEL AND ITS SUBSIDIARIES, SUBCONTRACTORS AND AFFILIATES, AND THE DIRECTORS, OFFICERS, AND EMPLOYEES OF EACH, HARMLESS AGAINST ALL CLAIMS COSTS, DAMAGES, AND EXPENSES AND REASONABLE ATTORNEYS' FEES ARISING OUT OF, DIRECTLY OR INDIRECTLY, ANY CLAIM OF PRODUCT LIABILITY, PERSONAL INJURY, OR DEATH ARISING IN ANY WAY OUT OF SUCH MISSION CRITICAL APPLICATION, WHETHER OR NOT INTEL OR ITS SUBCONTRACTOR WAS NEGLIGENT IN THE DESIGN, MANUFACTURE, OR WARNING OF THE INTEL PRODUCT OR ANY OF ITS PARTS.

Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked "reserved" or "undefined". Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The information here is subject to change without notice. Do not finalize a design with this information.

The products described in this document may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.

Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order.

Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800-548-4725, or go to: <http://www.intel.com/design/literature.htm>

\* Other names and brands may be claimed as the property of others.

Copyright © 2014 Intel Corporation. All rights reserved